30 research outputs found
Development of Genomic Markers and Mapping Tools for Assembling the Allotetraploid Gossypium hirsutum L. Draft Genome Sequence
Cotton (Gossypium spp.) is the largest producer of natural textile fibers. Most worldwide and domestic cotton fiber production is based on cultivars of G. hirsutum L., an allotetraploid. Genetic improvement of cotton remains constrained by alarmingly low levels of genetic diversity, inadequate genomic tools for genetic analysis and manipulation, and the difficulty of effectively harnessing the vastly greater genetic diversity harbored by other Gossypium species. Development of large numbers of single nucleotide polymorphisms (SNPs) for use in intraspecific and interspecific populations will allow for cotton germplasm diversity characterization, high-throughput genotyping, marker-assisted breeding, germplasm introgression of advantageous traits from wild species, and high-density genetic mapping. My research has been focused on utilizing next generation sequencing data for intraspecific and interspecific SNP marker development, validation, and creation of high-throughput genotyping methods to advance cotton research.
I used transcriptome sequencing to develop and map the first gene-associated SNPs for five species, G. barbadense (Pima cotton), G. tomentosum, G. mustelinum, G. armourianum, and G. longicalyx. A total of 62,832 non-redundant SNPs were developed. These can be utilized for interspecific germplasm introgression into cultivated G. hirsutum, as well as for subsequent genetic analysis and manipulation. To create SNP-based resources for integrated physical mapping, I used BAC-end sequences (BESs) and resequecing data for 12 G. hirsutum lines, a Pima line and G. longicalyx to derive 132,262 intraspecific and 693,769 interspecific SNPs located in BESs. These SNP data sets were used to help build the first high-throughput genotyping array for cotton, the CottonSNP63K, which now provides a standardized platform for global cotton research. I applied the array to two F2 populations and produced the first two high-density SNP maps for cotton, one intraspecific and one interspecific. By resequencing two interspecific F1 hypo-aneuploids, I also demonstrated that the chromosome-wide changes in SNP genotypes enable highly effective mass-localization of BACs to individual cotton chromosomes. These efforts provide additional validation and placement methods that can be directly integrated with the physical map being constructed for G. hirsutum and enable the production of a high-quality draft genome sequence for cultivated cotton.
I used transcriptome sequencing to develop and map the first gene-associated SNPs for five species, G. barbadense (Pima cotton), G. tomentosum, G. mustelinum, G. armourianum, and G. longicalyx. A total of 62,832 non-redundant SNPs were developed. These can be utilized for interspecific germplasm introgression into cultivated G. hirsutum, as well as for subsequent genetic analysis and manipulation. To create SNP-based resources for integrated physical mapping, I used BAC-end sequences (BESs) and resequecing data for 12 G. hirsutum lines, a Pima line and G. longicalyx to derive 132,262 intraspecific and 693,769 interspecific SNPs located in BESs. These SNP data sets were used to help build the first high-throughput genotyping array for cotton, the CottonSNP63K, which now provides a standardized platform for global cotton research. I applied the array to two F2 populations and produced the first two high-density SNP maps for cotton, one intraspecific and one interspecific. By resequencing two interspecific F1 hypo-aneuploids, I also demonstrated that the chromosome-wide changes in SNP genotypes enable highly effective mass-localization of BACs to individual cotton chromosomes. These efforts provide additional validation and placement methods that can be directly integrated with the physical map being constructed for G. hirsutum and enable the production of a high-quality draft genome sequence for cultivated cotton
Representing true plant genomes: haplotype-resolved hybrid pepper genome with trio-binning
As sequencing costs decrease and availability of high fidelity long-read sequencing increases, generating experiment specific de novo genome assemblies becomes feasible. In many crop species, obtaining the genome of a hybrid or heterozygous individual is necessary for systems that do not tolerate inbreeding or for investigating important biological questions, such as hybrid vigor. However, most genome assembly methods that have been used in plants result in a merged single sequence representation that is not a true biologically accurate representation of either haplotype within a diploid individual. The resulting genome assembly is often fragmented and exhibits a mosaic of the two haplotypes, referred to as haplotype-switching. Important haplotype level information, such as causal mutations and structural variation is therefore lost causing difficulties in interpreting downstream analyses. To overcome this challenge, we have applied a method developed for animal genome assembly called trio-binning to an intra-specific hybrid of chili pepper (Capsicum annuum L. cv. HDA149 x Capsicum annuum L. cv. HDA330). We tested all currently available softwares for performing trio-binning, combined with multiple scaffolding technologies including Bionano to determine the optimal method of producing the best haplotype-resolved assembly. Ultimately, we produced highly contiguous biologically true haplotype-resolved genome assemblies for each parent, with scaffold N50s of 266.0 Mb and 281.3 Mb, with 99.6% and 99.8% positioned into chromosomes respectively. The assemblies captured 3.10 Gb and 3.12 Gb of the estimated 3.5 Gb chili pepper genome size. These assemblies represent the complete genome structure of the intraspecific hybrid, as well as the two parental genomes, and show measurable improvements over the currently available reference genomes. Our manuscript provides a valuable guide on how to apply trio-binning to other plant genomes
CitDet: A Benchmark Dataset for Citrus Fruit Detection
In this letter, we present a new dataset to advance the state of the art in
detecting citrus fruit and accurately estimate yield on trees affected by the
Huanglongbing (HLB) disease in orchard environments via imaging. Despite the
fact that significant progress has been made in solving the fruit detection
problem, the lack of publicly available datasets has complicated direct
comparison of results. For instance, citrus detection has long been of interest
in the agricultural research community, yet there is an absence of work,
particularly involving public datasets of citrus affected by HLB. To address
this issue, we enhance state-of-the-art object detection methods for use in
typical orchard settings. Concretely, we provide high-resolution images of
citrus trees located in an area known to be highly affected by HLB, along with
high-quality bounding box annotations of citrus fruit. Fruit on both the trees
and the ground are labeled to allow for identification of fruit location, which
contributes to advancements in yield estimation and potential measure of HLB
impact via fruit drop. The dataset consists of over 32,000 bounding box
annotations for fruit instances contained in 579 high-resolution images. In
summary, our contributions are the following: (i) we introduce a novel dataset
along with baseline performance benchmarks on multiple contemporary object
detection algorithms, (ii) we show the ability to accurately capture fruit
location on tree or on ground, and finally (ii) we present a correlation of our
results with yield estimations.Comment: Submitted to IEEE Robotics and Automation Letters (RA-L
Development and bin mapping of gene-associated interspecific SNPs for cotton (Gossypium hirsutum L.) introgression breeding efforts
BACKGROUND: Cotton (Gossypium spp.) is the largest producer of natural fibers for textile and is an important crop worldwide. Crop production is comprised primarily of G. hirsutum L., an allotetraploid. However, elite cultivars express very small amounts of variation due to the species monophyletic origin, domestication and further bottlenecks due to selection. Conversely, wild cotton species harbor extensive genetic diversity of prospective utility to improve many beneficial agronomic traits, fiber characteristics, and resistance to disease and drought. Introgression of traits from wild species can provide a natural way to incorporate advantageous traits through breeding to generate higher-producing cotton cultivars and more sustainable production systems. Interspecific introgression efforts by conventional methods are very time-consuming and costly, but can be expedited using marker-assisted selection. RESULTS: Using transcriptome sequencing we have developed the first gene-associated single nucleotide polymorphism (SNP) markers for wild cotton species G. tomentosum, G. mustelinum, G. armourianum and G. longicalyx. Markers were also developed for a secondary cultivated species G. barbadense cv. 3–79. A total of 62,832 non-redundant SNP markers were developed from the five wild species which can be utilized for interspecific germplasm introgression into cultivated G. hirsutum and are directly associated with genes. Over 500 of the G. barbadense markers have been validated by whole-genome radiation hybrid mapping. Overall 1,060 SNPs from the five different species have been screened and shown to produce acceptable genotyping assays. CONCLUSIONS: This large set of 62,832 SNPs relative to cultivated G. hirsutum will allow for the first high-density mapping of genes from five wild species that affect traits of interest, including beneficial agronomic and fiber characteristics. Upon mapping, the markers can be utilized for marker-assisted introgression of new germplasm into cultivated cotton and in subsequent breeding of agronomically adapted types, including cultivar development. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-945) contains supplementary material, which is available to authorized users
An anchored chromosome-scale genome assembly of spinach improves annotation and reveals extensive gene rearrangements in euasterids.
Spinach (Spinacia oleracea L.) is a member of the Caryophyllales family, a basal eudicot asterid that consists of sugar beet (Beta vulgaris L. subsp. vulgaris), quinoa (Chenopodium quinoa Willd.), and amaranth (Amaranthus hypochondriacus L.). With the introduction of baby leaf types, spinach has become a staple food in many homes. Production issues focus on yield, nitrogen-use efficiency and resistance to downy mildew (Peronospora effusa). Although genomes are available for the above species, a chromosome-level assembly exists only for quinoa, allowing for proper annotation and structural analyses to enhance crop improvement. We independently assembled and annotated genomes of the cultivar Viroflay using short-read strategy (Illumina) and long-read strategies (Pacific Biosciences) to develop a chromosome-level, genetically anchored assembly for spinach. Scaffold N50 for the Illumina assembly was 389 kb, whereas that for Pacific BioSciences was 4.43 Mb, representing 911 Mb (93% of the genome) in 221 scaffolds, 80% of which are anchored and oriented on a sequence-based genetic map, also described within this work. The two assemblies were 99.5% collinear. Independent annotation of the two assemblies with the same comprehensive transcriptome dataset show that the quality of the assembly directly affects the annotation with significantly more genes predicted (26,862 vs. 34,877) in the long-read assembly. Analysis of resistance genes confirms a bias in resistant gene motifs more typical of monocots. Evolutionary analysis indicates that Spinacia is a paleohexaploid with a whole-genome triplication followed by extensive gene rearrangements identified in this work. Diversity analysis of 75 lines indicate that variation in genes is ample for hypothesis-driven, genomic-assisted breeding enabled by this work
Diversity analysis of cotton (Gossypium hirsutum L.) germplasm using the CottonSNP63K Array
Cotton germplasm resources contain beneficial alleles that can be exploited to develop germplasm adapted to emerging environmental and climate conditions. Accessions and lines have traditionally been characterized based on phenotypes, but phenotypic profiles are limited by the cost, time, and space required to make visual observations and measurements. With advances in molecular genetic methods, genotypic profiles are increasingly able to identify differences among accessions due to the larger number of genetic markers that can be measured. A combination of both methods would greatly enhance our ability to characterize germplasm resources. Recent efforts have culminated in the identification of sufficient SNP markers to establish high-throughput genotyping systems, such as the CottonSNP63K array, which enables a researcher to efficiently analyze large numbers of SNP markers and obtain highly repeatable results. In the current investigation, we have utilized the SNP array for analyzing genetic diversity primarily among cotton cultivars, making comparisons to SSR-based phylogenetic analyses, and identifying loci associated with seed nutritional traits. (Résumé d'auteur
There and back again: historical perspective and future directions for Vaccinium breeding and research studies
The genus Vaccinium L. (Ericaceae) contains a wide diversity of culturally and economically important berry crop species. Consumer demand and scientific research in blueberry (Vaccinium spp.) and cranberry (Vaccinium macrocarpon) have increased worldwide over the crops' relatively short domestication history (~100 years). Other species, including bilberry (Vaccinium myrtillus), lingonberry (Vaccinium vitis-idaea), and ohelo berry (Vaccinium reticulatum) are largely still harvested from the wild but with crop improvement efforts underway. Here, we present a review article on these Vaccinium berry crops on topics that span taxonomy to genetics and genomics to breeding. We highlight the accomplishments made thus far for each of these crops, along their journey from the wild, and propose research areas and questions that will require investments by the community over the coming decades to guide future crop improvement efforts. New tools and resources are needed to underpin the development of superior cultivars that are not only more resilient to various environmental stresses and higher yielding, but also produce fruit that continue to meet a variety of consumer preferences, including fruit quality and health related trait